Memory Management Strategies for Single-Pass Index Construction in Text Retrieval Systems
نویسندگان
چکیده
Many text retrieval systems construct their index by accumulating postings in main memory until there is no more memory available and then creating an on-disk index from the in-memory data. When the entire text collection has been read, all on-disk indices are combined into one big index through a multiway merge process. This paper discusses several ways to arrange postings in memory and studies the effects on memory requirements and indexing performance. Starting from the traditional approach that holds all postings for one term in a linked list, we examine strategies for combining consecutive postings into posting groups and arranging these groups in a linked list in order to reduce the number of pointers in the linked list. We then extend these techniques to compressed posting lists and finally evaluate the effects they have on overall indexing performance for both static and dynamic text collections. Substantial improvements are achieved over the initial approach.
منابع مشابه
Efficient online index maintenance for contiguous inverted lists q Nicholas Lester
Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, ...
متن کاملEfficient online index maintenance for contiguous inverted lists q
Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, ...
متن کاملEfficient online index maintenance for contiguous inverted lists
Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, ...
متن کاملEfficient single-pass index construction for text databases
Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does ...
متن کاملA Hybrid Approach to Index Maintenance in Dynamic Text Retrieval Systems
In-place and merge-based index maintenance are the two main competing strategies for on-line index construction in dynamic information retrieval systems based on inverted lists. Motivated by recent results for both strategies, we investigate possible combinations of in-place and merge-based index maintenance. We present a hybrid approach in which long posting lists are updated in-place, while s...
متن کامل